Book Layout Analysis: TOC Structure Extraction Engine
Identifieur interne : 000A05 ( Main/Exploration ); précédent : 000A04; suivant : 000A06Book Layout Analysis: TOC Structure Extraction Engine
Auteurs : Bodin Dresevic [Serbie] ; Aleksandar Uzelac [Serbie] ; Bogdan Radakovic [Serbie] ; Nikola Todic [Serbie]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2009.
Abstract
Abstract: Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.
Url:
DOI: 10.1007/978-3-642-03761-0_17
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001C94
- to stream Istex, to step Curation: 001B75
- to stream Istex, to step Checkpoint: 000527
- to stream Main, to step Merge: 000A13
- to stream Main, to step Curation: 000A05
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Book Layout Analysis: TOC Structure Extraction Engine</title>
<author><name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
</author>
<author><name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
</author>
<author><name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
</author>
<author><name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104</idno>
<date when="2009" year="2009">2009</date>
<idno type="doi">10.1007/978-3-642-03761-0_17</idno>
<idno type="url">https://api.istex.fr/document/3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001C94</idno>
<idno type="wicri:Area/Istex/Curation">001B75</idno>
<idno type="wicri:Area/Istex/Checkpoint">000527</idno>
<idno type="wicri:doubleKey">0302-9743:2009:Dresevic B:book:layout:analysis</idno>
<idno type="wicri:Area/Main/Merge">000A13</idno>
<idno type="wicri:Area/Main/Curation">000A05</idno>
<idno type="wicri:Area/Main/Exploration">000A05</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Book Layout Analysis: TOC Structure Extraction Engine</title>
<author><name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
<affiliation wicri:level="1"><country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: bodind@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
<affiliation wicri:level="1"><country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: aleksandar.uzelac@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<affiliation wicri:level="1"><country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: bogdan.radakovic@microsoft.com</wicri:noCountry>
</affiliation>
</author>
<author><name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<affiliation wicri:level="1"><country xml:lang="fr">Serbie</country>
<wicri:regionArea>Microsoft Development Center Serbia, Makedonska 30, 11000, Belgrade</wicri:regionArea>
<wicri:noRegion>Belgrade</wicri:noRegion>
</affiliation>
<affiliation><wicri:noCountry code="no comma">E-mail: nikola.todic@microsoft.com</wicri:noCountry>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2009</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104</idno>
<idno type="DOI">10.1007/978-3-642-03761-0_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.</div>
</front>
</TEI>
<affiliations><list><country><li>Serbie</li>
</country>
</list>
<tree><country name="Serbie"><noRegion><name sortKey="Dresevic, Bodin" sort="Dresevic, Bodin" uniqKey="Dresevic B" first="Bodin" last="Dresevic">Bodin Dresevic</name>
</noRegion>
<name sortKey="Radakovic, Bogdan" sort="Radakovic, Bogdan" uniqKey="Radakovic B" first="Bogdan" last="Radakovic">Bogdan Radakovic</name>
<name sortKey="Todic, Nikola" sort="Todic, Nikola" uniqKey="Todic N" first="Nikola" last="Todic">Nikola Todic</name>
<name sortKey="Uzelac, Aleksandar" sort="Uzelac, Aleksandar" uniqKey="Uzelac A" first="Aleksandar" last="Uzelac">Aleksandar Uzelac</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A05 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A05 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:3FB188D3FE45799E2C7B6C48DF48E58CB6BA2104 |texte= Book Layout Analysis: TOC Structure Extraction Engine }}
This area was generated with Dilib version V0.6.32. |